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This study used latent class analysis with data from the 2002/03 Head Start Impact 
Study to determine whether early childhood classrooms can be sorted into classroom 
quality groups based on their scores on multiple measures of quality, how many 
classroom quality groups could be identified, and what percentages of classrooms 
fall within each classroom quality group. Based on the 13 measures examined, Head 
Start and center-based early childhood classrooms could be grouped into three distinct 
classroom quality groups: good, fair, and poor. Classroom quality measures determined 
by independent observers distinguish classroom quality groups better than self-reported 
measures do. 


This Stated Briefly report summarizes the findings of Irwin, C. W., Madura, J. P., Bamat, D., & 
McDermott, P. A. (2016). Patterns of classroom quality in Head Start and center-based early childhood 
education programs (REL 2017-199). Washington, DC: U.S. Department of Education, Institute of 
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project.asp?projectID=4517. 
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Why this study? 


Numerous early childhood studies have found a positive relationship between high-quality early child- 
hood education classrooms and favorable academic outcomes, language development, and social skills (for 
example, Burchinal, Kainz, & Cai, 2011; Burchinal et al., 2009; Keys et al., 2013; National Institute of 
Child Health and Human Development Early Child Care Research Network, 2000; Peisner-Feinberg et al., 
2001; and Weiland & Yoshikawa, 2013). The importance of providing high-quality early childhood educa- 
tion to all children, especially children from low-income families, is gaining national attention. In a 2014 
Gallup poll 70 percent of Americans reported that they support using federal funds to make high-quality 
preschool available to all children (McCann, 2014). In addition, grant competitions such as the Race to the 
Top—Early Learning Challenge and the President’s Preschool for All and Preschool Development Grants 
initiatives have brought increased attention to measuring the quality of early childhood education. These 
grants may focus on quality because of the evidence of a significant relationship between high-quality 
learning experiences, as found in the studies cited above. 


Measuring classroom quality and ensuring high-quality learning experiences for young children are key 
interests of the Regional Educational Laboratory Northeast & Islands Early Childhood Education Research 
Alliance, whose members include staff of state agencies that oversee quality improvement initiatives for 
early childhood education. Alliance members wanted to better understand ways of measuring classroom 
quality and of identifying which measures best differentiate between high- and low-quality classrooms (see 
box 1 for definitions of key terms). Practitioners could also use classroom quality groups to identify profes- 
sional development opportunities that may be suitable for a particular set of classroom teachers. Regional 
Educational Laboratory Northeast & Islands conducted this study in collaboration with the research alli- 
ance to address these interests. 


What the study examined 


The main purpose of this study was to determine whether early childhood classrooms can be sorted into 
classroom quality groups based on their scores on multiple measures of quality, how many classroom quality 
groups could be identified, and what percentages of classrooms fall within each classroom quality group. 
Latent class analysis and data from the 2002/03 Head Start Impact Study were used (U.S. Department of 
Health and Human Services, 2015). The sample included 1,061 Head Start classrooms and 421 center-based 
classrooms (29 classrooms were removed because of missing data, leaving 1,453 classrooms in the analysis). 
The study also used 13 measures of quality (5 selfreported measures and 8 observation-based measures). 


The study was guided by two research questions: 
e How many quality patterns are evident across classrooms, and what percentage of classrooms follow 
each pattern? 
e Which measures of quality contribute to the identification of classroom quality groups? 


Five of the 13 measures of quality were self reported: measures of conflict and closeness that are part of 
the Student-Teacher Relationship Scale (Pianta, 2001; measured on a scale of 1-5), measures of academic 
activities that indicate the number of literacy, math, and other activities offered in the classroom at least 
three times per week (measured on a scale of 0-12 for literacy activities, 0-7 for math activities, and 0-4 for 
other activities). Eight of the 13 measures were observation based: six measures of the classroom environ- 
ment that are part of the Early Childhood Environment Rating Scale—Revised (Harms, Clifford, & Cryer, 
1998; measured on a scale of 1-7) and two measures of harshness and sensitivity in teacher interactions 
with students that are part of the Arnett Caregiver Interaction Scale (Arnett, 1989; measured on a scale of 
1-4). Detailed descriptions of these measures are in the appendix. 


Box 1. Key terms 


Center-based early childhood classroom. An early learning environment that takes place in a formal setting 
outside a child’s or caregiver’s home, usually provided by an incorporated organization of professional child 
care providers. For this study, center-based care providers are distinct from Head Start providers and are the 
centers that children attended who were eligible for Head Start but were randomly assigned not to receive Head 
Start. Center-based programs vary programmatically—for example, some provide full-day services while others 
provide part-day services, and some use an evidence-based curriculum while others do not use any curriculum 
at all; also, center-based programs vary in mission or purpose. 


Classroom quality. The structure, processes, and interactions in the early childhood classroom setting that 
have been associated with positive child outcomes in the research literature. 


Classroom quality group. A group of classrooms that display similar patterns of quality across 13 measures 
(see the appendix). 


Head Start. Established in 1965 as one of President Lyndon B. Johnson’s “war on poverty” initiatives, Head 
Start is a program administered by the U.S. Department of Health and Human Services that provides early 
childhood education, health, nutrition, and parent involvement services to low-income children from birth to age 
5 and their families. Like the center-based programs, Head Start programs vary programmatically—for example, 
some provide full-day services while others provide part-day services, and the specific curricula used varies. 


Head Start Impact Study. A Congressionally mandated impact study conducted with a nationally representative 
sample of 4,667 3- and 4-year-old children who were eligible for Head Start and randomly assigned to either a 
participating Head Start program or a control group that did not have access to the participating Head Start pro- 
grams but could enroll in other early childhood programs selected by their families. Data are collected through 
classroom observations, teacher reports, and surveys. 


Instructional activities. Classroom experiences prepared and carried out by teachers to foster the cognitive, 
academic, and social-emotional development of children in their care. 


Latent class analysis. A multivariate statistical method for identifying groups that are empirically distinguishable 
based on different patterns of responses to observed measures. More specifically, it is a statistical analysis used 
to relate or assign a set of observations from a sample or population to unobservable (or latent) groups character- 
ized by a pattern of conditional probabilities that indicate the chance that characteristics take on certain values. 


Patterns of classroom quality. A distinctive combination of related measures of classroom quality, including 
teacher-child interactions, instructional activities, and physical environment. 


Teacher-child interactions. The nature of the exchanges between the teacher and child in a child care setting, 
including the teacher’s emotional tone, discipline style, and responsiveness, as well as other indicators of 
closeness or conflict between the teacher and child. 


This study shows a way to analyze classroom quality and provides an example of patterns of classroom 
quality in programs serving Head Start—eligible children across the country. Latent class analysis was used 
to assign classrooms to unobservable (or latent) groups based on their scores across the 13 measures of 
quality. Latent class analysis uses patterns of conditional probabilities (which indicate the chance that 
characteristics take on certain values) to assign classrooms to different groups (see Irwin, Madura, Bamat, 
& McDermott, 2016, for a full description of the analyses). A central assumption of the technique is that 
all the observed variance in the measures is explained by the latent group membership. The findings can 
inform practitioners about what quality looks like in these settings and add to the literature on measuring 
quality in early childhood education. A second purpose of the study was to explore the extent to which 
each measure contributed to the identification of classroom quality groups. 
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What the study found 


The study found that Head Start and center-based early childhood classrooms can be grouped into three 
distinct classroom quality groups displaying different patterns of classroom quality. 


There are three distinct classroom quality groups based on the 13 measures examined: good, fair, and poor quality 


The three classroom quality groups can be described as good, fair, and poor. Classrooms with similar pat- 
terns were grouped, and group average scores on each quality measure were used to describe the three class- 
room quality groups (figure 1 and table 1). Classrooms can be grouped based on their performance across 
all 13 measures of classroom quality. But on any given measure an individual classroom may have a score 
that deviates from the average for its group. So classrooms’ performance on each measure varies across 
classroom quality groups and within a given quality pattern group. For example, a classroom may have a 
low score on one measure, such as space and furnishings, from the Early Childhood Environment Rating 
Scale-Revised (Harms, Clifford, & Cryer, 1998), but still be in the good-quality group because of its scores 
on the other measures. The classroom quality groups are described as follows: 
© Good quality. This group is characterized by good to excellent classroom environment scores and 
above-average sensitivity in teacher—child interactions scores. Teachers in this group also reported 


Figure 1. The three distinct early childhood classroom quality groups each display a different 
average pattern of classroom quality, 2002/03 


Score on classroom quality measure ™ Good Fair ™ Poor 
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Note: See the appendix for a full description of the measures. 

a. Uses the Student-Teacher Relationship Scale (Pianta, 2001). 

b. From the spring 2003 Head Start Impact Study teacher survey. 

c. Uses the Early Childhood Environment Rating Scale—Revised (Harms, Clifford, & Cryer, 1998). 
d. Uses the Arnett Caregiver Interaction Scale (Arnett, 1989). 

Source: Authors’ analysis of data from the 2002/03 Head Start Impact Study. 
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Table 1. Average rating on each classroom quality measure, by group membership, 2002/03 


Full sample Good-quality group Fair-quality group Poor-quality group 
Measure (n = 1,309-1,349") (n = 787-8194) (n = 397-403") (n = 95-102°) 


Student-teacher relationship scale? (scale of 1-5) 


Conflict 1.1 1.4 1.1 1.2 
Closeness 2.8 2.8 2.8 21 
Instructional activities® 

Literacy activities (scale of O-12) 7.4 7.6 7.4 6.1 
Math activities (scale of 0-7) 5.4 5.6 5.4 4.2 
Other activities (scale of O—4) 3.6 3.8 3.6 2.9 


Classroom environment? (scale of 0-7) 


Space and furnishings 5.0 5.5 4.4 3.5 
Personal care routines 5.2 6.0 4.2 2.9 
Language-reasoning 5.0 5.7 4.3 2.6 
Activities 4.4 5.0 3.5 2.7 
Interactions 5.6 6.3 5.1 2.4 
Program structure 5.4 6.2 4.3 2.6 
Sensitivity in teacher interactions with children® (scale of 1-4) 

Harshness 2.7 2.9 2.7 2.0 
Sensitivity 2.3 2.6 2.0 4.1 


Note: See the appendix for a full description of the measures, which includes details regarding scale anchors. There were no statis- 
tically significant differences in means across the three quality groups for any of the student-teacher relationship measures. There 
were Statistically significant differences between the good-quality group and the poor-quality group and between the fair-quality 
group and the poor-quality group for all three instructional activities measures. There were statistically significant differences across 
the three quality groups for all the classroom environment and sensitivity in teacher interactions with children measures. The latent 
class variable explained the highest levels of observed variance in the observation-based measures and provided consistently high 
separation among the variables and was thus the measure that contributed the most to sorting classrooms into the three quality 
groups. 


a. The latent class analysis model failed to assign classrooms to groups only when data were missing for all 13 measures of qual- 
ity. As a result, the number of classrooms with values for each of the individual quality measures varies. The range of sample sizes 
for each group represent the smallest and largest number of classrooms with values over the individual measures of quality. The 
overall group membership remains the same as in the descriptions of the groups in the text: 901 for the good-quality group, 436 for 
the fair-quality group, and 116 for the poor-quality group. 


b. Uses the Student-Teacher Relationship Scale (Pianta, 2001). 

c. From the spring 2003 Head Start Impact Study teacher survey. 

d. Uses the Early Childhood Environment Rating Scale—Revised (Harms, Clifford, & Cryer, 1998). 
e. Uses the Arnett Caregiver Interaction Scale (Arnett, 1989). 


Source: Authors’ analysis of data from the 2002/03 Head Start Impact Study. 


average closeness with their children, low conflict with their children, and an average number of 
instructional activities at least three times per week. Based on these features, this group could be 
described as a sensitive, environmentally rich classroom group. This group accounted for 62 percent 
of the classrooms examined in this study (n = 901). 

e Fair quality. This group is characterized by minimal to good classroom environment scores as well 
as average sensitivity and harshness in teacher—child interactions scores. Teachers in this group 
also reported average closeness with their children. This group differs from the good-quality group 
in having lower scores on average in both classroom environment and sensitivity of teacher—child 
interactions. But this group is similar to the good-quality group in that teachers in this group 
reported low conflict with their children and an average number of instructional activities at least 
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three times per week. This group accounted for 30 percent of the classrooms examined in this 
study sample (n = 436). 

e Poor quality. This group is characterized by inadequate to minimal classroom environment scores, 
and teachers displaying little sensitivity in their interactions with children. This group displays 
scores similar to those of the other groups on the closeness and conflict scales. Based on these fea- 
tures, this group could be described as an insensitive, somewhat harsh, environmentally poor class- 
room group. This group accounted for 8 percent of the classrooms examined in this study (n = 116). 


Observation-based measures distinguish classroom quality patterns better than self-reported measures do 


The eight measures that were based on independent observations contributed more to the identification 
of the classroom quality groups than the five self-reported measures did (see the appendix for details on 
which measures were self-reported and which were observation-based). This can be seen by comparing 
the average scores on each measure across groups and by determining the extent to which the variation 
in scores on each measure is explained by the classroom quality groups (see Irwin et al., 2016, for a more 
detailed discussion). 


For the six classroom environment measures, which were scored by independent observers, average scores for 
classrooms in the good-quality group ranged from good to excellent (or from about 5 to 6 on a seven-point 
scale), average scores for classrooms in the fait-quality group were in the good range (or from 3.5 to 5), and 
average scores for classrooms in the poor-quality group ranged from inadequate to minimal (or from 2 to 3.5). 


In contrast, average scores for the self-reported measures of conflict and closeness in the student-teacher 
relationships category were similar across groups. The contribution of the measures to identifying quality 
groups was also examined by determining the extent to which the variation in the scores could be 
explained by the classroom quality groups. Selfreported measures explained 0.6—-11.6 percent of the varia- 
tion in group membership, while observation-based measures explained 40.8—65.2 percent of this variation 
(see Irwin et al., 2016, for a detailed discussion of this finding). 


Implications of the study findings 


There are three main implications of the study findings. 
It is possible to use multiple dimensions of the classroom experience to identify classroom quality patterns 


The findings suggest that measures representing different elements of the classroom experience can identify 
classroom quality patterns and that classrooms can be grouped according to those patterns. The findings 
also suggest that the majority of classrooms can be described as good quality, while few can be described 
as poor quality. Practitioners and policymakers can use this information, along with the body of evidence 
currently available in the research literature, to inform how they can meaningfully interpret measures of 
classroom quality. In addition, an extension of this study could include a similar exploration using local 
data (in a collaboration between practitioners and researchers) and updated measures of quality to see 
whether similar patterns unfold or whether there are associations with teacher and program characteristics 
or academic outcomes, language development, or social skills. 


Identifying classroom quality patterns will likely require independent observers 


The findings suggest that practitioners and policymakers may not be able to rely on self-reported measures 
of classroom quality. The self-reported measures in this study showed little difference in their means across 
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the three classroom quality groups identified. Where differences were apparent in self-reported instruction- 
al activities, the measures best distinguished between two rather than three groups. In contrast, measures 
based on independent observation using the Early Childhood Environment Rating Scale—Revised and the 
Arnett Caregiver Interaction Scale distinguished among patterns of classroom quality according to the 
amount of variation in group membership explained by each measure. As with any self-reported measures, 
practitioners should consider the measures’ limitations in differentiating classroom quality. If selfreported 
measures are desired or deemed necessary, practitioners may need to look for selfreported measures that 
better differentiate classrooms on quality. 


An individual classroom may not be perfectly characterized by its classroom quality group 


Although it is possible to assign individual classrooms to classroom quality groups, classroom scores will 
likely vary on the individual measures. This suggests that practitioners should expect a variety of quality 
scores for individual classrooms within classroom quality groups. Practitioners could use classroom quality 
groups as a way to identify professional development opportunities that may be suitable for a particular set 
of classroom teachers, but practitioners may also need to consider the individual needs of specific class- 
rooms by examining the results on each measure. There are many possible explanations (and models) for 
variation across individual classrooms within a quality group, but even this small sample suggests that 
additional analyses could help shape policies surrounding classroom quality. 


Limitations of the study 


This study has three main limitations. 


First, the data used in the study are from the 2002/03 school year. In addition to improvements in the mea- 
surement of classroom quality since that time, there have been several national and numerous state-level 
preschool initiatives aimed at raising the quality of early childhood education (for example, the Race to 
the Top—Early Learning Challenge grant program, the Preschool Development Grants program, and state 
quality rating and improvement systems). 


Second, the study sample includes only classrooms serving Head Start-eligible children. The results may 
not be relevant to programs serving other populations of children (such as children from middle- or upper 
income families) and do not reflect the early learning experiences of children receiving home-based care. 


Third, although latent class analysis identifies patterns of classroom quality, it does not account for observed 
within-group variation, which could be addressed using a more general factor mixture model that includes 
a factor analytic component. A factor analytic component would allow for the modeling of differences on 
the measures that occur within each quality group. The current model does not do that; it simply assumes 
that all the observed variation on the given measures is explained by the quality group to which a class- 
room best belongs. 


Appendix. Measures of quality used to find patterns of classroom quality 


This appendix describes the 13 measures of quality used to find patterns of classroom quality, which are 
divided into self-reported measures and observation-based measures. 


Self-reported measures 


Two self-reported measures were related to student-teacher relationships and are based on the Student— 
Teacher Relationship Scale (Pianta, 2001). The short form of this instrument includes a conflict scale 
(a = .92) and a closeness scale (# = .86), both of which were used in the current study. 


Conflict. This measure represents the degree to which a child care provider perceives detachment, dis- 
agreement, and mistrust between himself or herself and a particular child. Example item: “The child feels 
that I treat him/her unfairly.” The measure contains eight items, which are rated on a five-point scale from 
definitely does not apply to definitely applies and were self-reported by teachers in spring 2003. Item scores 
are averaged to create the conflict score, which ranges from 1 to 5. For the current study, scores were aver- 
aged across all children in a classroom to establish a classroom score. 


Closeness. This measure represents the degree to which a child care provider perceives attachment and 
trust between himself or herself and a particular child. Example item: “This child values his/her relation- 
ship with me.” The measure contains seven items, which are rated on a five-point scale from definitely does 
not apply to definitely applies and were self-reported by teachers in spring 2003. Item scores are averaged to 
create the closeness score, which ranges from 1 to 5. For the current study, scores were averaged across all 
children in a classroom to establish a classroom score. 


Three selfreported measures relate to instructional activities and are from the spring 2003 Head Start 
Impact Study teacher survey. Because these are counts of activities, there is no reliability or validity evi- 
dence in the Head Start Impact Study documentation. 


Literacy activities. This measure represents the number of literacy activities (such as practice writing 
letters and words, retelling or making up stories, and learning about rhyming words) offered in the class- 
room at least three times per week, as self-reported by teachers. Scores range from 0 to 12. 


Math activities. This measure represents the number of math activities (such as counting out loud, using 
music to understand math, and working with rulers and other measuring instruments) offered in the class- 
room at least three times per week, as self-reported by teachers. Scores range from 0 to 7. 


Other activities. This measure represents the number of other activities offered in the classroom at least 
three times per week (such as working on arts and crafts, playing sports or exercising, and having children 
assist with classroom chores), as self-reported by teachers. Scores range from 0 to 4. 


Observation-based measures 
Six observation-based measures relate to the classroom environment and use the Early Childhood Envi- 
ronment Rating Scale—Revised (Harms, Clifford, & Cryer, 1998). Internal consistency estimates for these 


measures range from .72 to .92. 


Space and furnishings. This measure represents the quality of the classroom environment’s space for 
routine care, play, learning, and privacy; furnishings for relaxation and comfort; and gross motor equipment 
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and child-related displays. The measure contains eight items, which are rated on a seven-point scale from 
inadequate to excellent based on the judgment of an observer in spring 2003. Item scores are averaged to 
create the space and furnishings score, which ranges from | to 7. 


Personal care routines. This measure represents the quality of routine related to greeting or departing, 
meals or snacks, nap or rest, toileting or diapering, health practices, and safety practices. The measure has 
six items, which are rated on a seven-point scale from inadequate to excellent based on the judgment of an 
observer in spring 2003. Item scores are averaged to create the personal care routines score, which ranges 
from | to 7. 


Language-reasoning. This measure represents the quality of the classroom’s books and pictures as well as 
the caregiver’s encouragement of children to communicate, children’s use of language to develop reasoning 
skills, and the informal use of language. The measure has four items, which are rated on a seven-point scale 
from inadequate to excellent based on the judgment of an observer in spring 2003. Item scores are averaged 
to create the language-reasoning score, which ranges from 1 to 7. 


Activities. This measure represents the quality of fine motor, art, music/movement, nature/science, and 
math/number activities, as well as activities that involve blocks, dramatic play, sand/water, digital technolo- 
gy, and the promotion/acceptance of diversity. The measure has 10 items, which are rated on a seven-point 
scale from inadequate to excellent based on the judgment of an observer in spring 2003. Item scores are 
averaged to create the activities score, which ranges from | to 7. 


Interactions. This measure represents the quality of the caregiver’s supervision of activities and discipline, 
as well as the teacher—child interactions and the interactions among children. The measure has five items, 
which are rated on a seven-point scale from inadequate to excellent based on the judgment of an observer 
in spring 2003. Item scores are averaged to create the interactions score, which ranges from 1 to 7. 


Program structure. This measure represents the quality of the classroom’s schedule, free play, group time, 
and provisions for children with disabilities. The measure has four items, which are rated on a seven-point 
scale from inadequate to excellent based on the judgment of an observer in spring 2003. Item scores are 
averaged to create the program structures score, which ranges from | to 7. 


Two independent observation-based measures relate to sensitivity in teacher interactions with children and 
use the Arnett Caregiver Interaction Scale (Arnett, 1989). 


Harshness. This measure represents the degree to which a caregiver behaves harshly in their interactions 
with the children in their care. Example item: “Punishes the children without explanation.” The measure 
has nine items, which are rated on a four-point scale from not at all true to very much true based on the 
judgment of an observer in spring 2003. Item scores are averaged to create the harshness score, which 
ranges from | to 4. The internal consistency for this measure is .83. 


Sensitivity. This measure represents the degree to which a caregiver behaves sensitively in his or her inter 
actions with children in his or her care. Example item: “Listens attentively when children speak to him/ 
her.” The measure has 10 items, which are rated on a four-point scale from not at all true to very much true 
based on the judgment of an observer in spring 2003. Items scores are averaged to create the sensitivity 
score, which ranges from | to 4. The internal consistency for this measure is .93. 
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